-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add note about istio-cni security #16749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Keith Mattix II <[email protected]>
Left 2 comments that may be worthwhile pointing out. Otherwise LGTM. |
Signed-off-by: Keith Mattix II <[email protected]>
Signed-off-by: Keith Mattix II <[email protected]>
@@ -40,6 +40,10 @@ The implications of this are discussed [below](#node-compromise). | |||
Because this consolidates the elevated privileges required to setup networking into a single pod, rather than *every* pod, | |||
this option is generally recommended. | |||
|
|||
#### Ambient Mode | |||
|
|||
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if using node cordons + taints as described [in our upgrade documentation](/docs/ambient/upgrade/helm#cni-node-agent). In these rare cases (e.g. on node restart or new node scale out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. The Istio community is working with [various](https://github.com/containernetworking/cni/pull/1052) [upstream](https://github.com/kubernetes/kubernetes/issues/130594) communities to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/jaellio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. | |
In ambient mode, the Istio CNI plugin (and the associated node agent) manages mesh enrollment for pods living on its node. | |
Due to limitations in the Kubernetes API, it is not currently possible for the CNI plugin or its node agent to prevent pods from being scheduled on the node before the CNI plugin is installed and configured. This can occur even if [using node cordons and taints](/docs/ambient/upgrade/helm#cni-node-agent). In rare cases (e.g. on node restart or new node scale-out), it is possible that a pod that is labeled for mesh enrollment may come up before the CNI's traffic redirection rules are applied, meaning that policies won't be enforced until after the CNI comes up and that pod is restarted. | |
The Istio community is working with [the CNI](https://github.com/containernetworking/cni/pull/1052) and [Kubernetes communities](https://github.com/kubernetes/kubernetes/issues/130594) to address this limitation, but in the meantime, you can enable [owned CNI mode](https://github.com/istio/istio/blob/master/releasenotes/notes/55968.yaml) to mitigate these race conditions. |
Reformatted a little for clarity. I wouldn't have thought a pod restart was required anywhere along the way, but one is mentioned? (In the context of the node taint controller, @ilrudie told me that it was only a problem until the CNI agent claimed the pod)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. please fast-follow by documenting that owned CNI mode feature somewhere other than a relnote!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately a pod restart is required due to the nature of the bug 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should we document an env var based feature like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we really not handle all existing pods when the CNI starts? We used to be able to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keithmattix What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we really not handle all existing pods when the CNI starts? We used to be able to do that.
@ilrudie (not positive this is what you meant, so correct me if I misunderstood :) )
Currently, our reconciliation on istio-cni happens before we install the istio cni. binary and config, so if a pod doesn't have a net ns when the snapshot is taken for reconciliation (we ignore it in this case) and has a net ns configured before istio-cni is installed then the pod will only be meshed after a restart.
We could fix this by changing the ordering of reconciliation and install as discussed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you thoughts on including this update in 1.27.1 after we address the additional bugs above and have an alternative solution?
This sounds good to me - I wasn't aware/didn't fully understand the reconciliation aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should we document an env var based feature like this?
Somewhere in the ambient documentation relating to the CNI.
Perhaps https://istio.io/latest/docs/ambient/architecture/traffic-redirection/ for now?
We need to separate out the page under the "sidecar" section at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaellio, that makes sense. I thought we did reconcile after installing the binary but if we don't then I can totally see how we'd sometimes wind up in limbo like this.
Thanks for the explanation.
Co-authored-by: Craig Box <[email protected]>
Description
Add note about Istio CNI's security implications in ambient mode
Reviewers